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Abstract. With Q q , n the distribution of n minus the rank of a matrix 
M n chosen uniformly from Mat(n, q), the collection of all n x n matrices 
over the finite field ¥ q of size q > 2, and Q q the distributional limit of 
Qq.n as n — > oo, we apply Stein's method to prove the total variation 
bound 

< \\Q q ,n - Q q \\rv <-^rr- 



In addition, we obtain similar sharp results for the rank distributions of 
symmetric, symmetric with zero diagonal, skew symmetric, skew cen- 
trosymmetric, and Hermitian matrices. 



1. Introduction 

We study the distribution of the rank for various ensembles of random 
matrices over finite fields. To give a flavor of our results, let M n be chosen 
uniformly from Mat(n, q), the collection of all n x n matrices over the finite 
field Fg of size q > 2. Letting Q q ^ n = n — rank(M n ), it is known (page 38 of 
[3]) that for all k in U n = {0, . . . , n}, 



(1) P(Q q ,n =k)= p k ,n where p k>r 



n-= fc+ i(i-^') 2 



g^II^iCi-rt 

Clearly for any fixed k € No, the collection of non negative integers, 

rwi-T^') 



(2) lim p k ^ n = p k where p k 



n— ¥oo 



g fe2 n- =1 (i-^) 2 



One of our main results, Theorem 1.1, provides sharp upper and lower 
bounds on the total variation distance between Q q ^ n , the distribution of 
Qq, n in (1), and its limit in (2), denoted Q q . Recall that the total variation 
distance between two probability distributions P\,P2 on a finite set S is 
given by 

(3) ||Pi - P 2 \\tv ■= \Y1 l p i( s ) " P 2(*)l = max \P\{A) - P 2 (A)\. 

seS 
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Theorem 1.1. For q > 2 and n > 1, 

( 4 ) ^TT^II^«-^II^^^T- 

The upper bound in Theorem 1.1 appears quite difficult to compute di- 
rectly by substituting the expressions for the point probabilities given in 
(1) and (2) into the defining expressions for the total variation distance in 
(3). On the other hand, use of Stein's method [27], [10] makes for a quite 
tractable computation. In Sections 4 through 7 we also apply our methods 
to ensembles of random matrices with symmetry constraints, in particular, 
to symmetric, symmetric with zero diagonal, skew symmetric, skew cen- 
trosymmetric, and Hermitian matrices. 

Next we give five pointers to the large literature on the rank distribution 
of random matrices over finite fields, demonstrating that the subject is of 
interest. First, one of the earliest systematic studies of ranks of random ma- 
trices from the finite classical groups is due to Rudvalis and Shinoda [25], 
[26]. They determine the rank distribution of random matrices from finite 
classical groups, and relate distributions such as Q q of (2) to identities of 
Euler. Second, ranks of random matrices from finite classical groups appear 
in works on the "Cohen-Lenstra heuristics" of number theory; see [31] for the 
finite general linear groups and [23] for the finite symplectic groups. Third, 
the rank distribution of random matrices over finite fields is useful in coding 
theory; see [4] and Chapter 15 of [22]. Fourth, the distribution of ranks of 
uniformly chosen random matrices over finite fields has been used to test 
random number generators [13], and there is interest in the rate of conver- 
gence to Q q . Fifth, there is work on ranks of random matrices over finite 
fields where the matrix entries are independent and identically distributed, 
but not necessarily uniform. For example the paper [9] uses a combination 
of Mobius inversion, finite Fourier transforms, and Poisson summation, to 
find conditions on the distribution of matrix entries under which the proba- 
bility of a matrix being invertible tends to po as n — > oo. Further results in 
this direction, including rank distributions of sparse matrices, can be found 
in [5], [11], [12], [19]. 

The organization of this paper is as follows. Section 2 provides some 
general tools for our application of Stein's method, and useful bounds on 
products such as Yl^l — 1/V)- The development followed here is along the 
lines of the "comparison of generators" method as in [17] and [18]. Section 3 
treats the rank distribution of uniformly chosen n x n matrices over a finite 
field, proving Theorem 1.1. Section 4 treats the rank distribution of random 
symmetric matrices over a finite field. Section 5 provides results for the rank 
distribution of a uniformly chosen symmetric matrix with diagonal; these 
are called "symplectic" matrices in Chapter 15 of [22], which uses their rank 
distribution in the context of error correcting codes. The same formulas 
for the rank distribution of symmetric matrices with zero diagonal also ap- 
ply to the rank distribution of random skew-symmetric matrices, when q is 
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odd. Section 6 treats the rank distribution of random skew centrosymmetric 
matrices over finite fields, and Section 7 treats the rank distribution of ran- 
dom Hermitian matrices over finite fields. The appendix gives an algebraic 
proof of the crucial fact (proved probabilistically in Section 3) that if Q n 
has distribution Q q . n of (1), then E(qfi n ) = 2- l/q n . 

In the interest of notational simplicity, in Sections 4 through 7, the specific 
rank distributions of the nxn matrices of interest, and their limits, will apply 
only locally in the section or subsection that contains them, and will there 
be consistently denoted by Q q>n and Q q , respectively. 

2. Preliminaries 

We begin with a general result for obtaining characterizations of discrete 
integer distributions. Our resulting identities are in the spirit of Proposition 
2.1 and Corollary 2.1 of [17], but of a somewhat simpler form not involving 
forward differences; see also [20]. We say a nonempty subset I of the integers 
Z is an interval if a, b £ I with a < b then [a, b] fl Z C I. Let C(X) denote 
the distribution of a random variable X. 

Lemma 2.1. Let {r^^k £ 1} be the distribution of a random variable Y 
having support the integer interval I. Then ifa(k) and b(k) are any functions 
such that 

(5) a(k)rk-i = b(k)rk for all k £ Z, 

then a random variable X having distribution C(Y) satisfies 

(6) E[a(X + l)f(X + 1)] = E[b(X)f(X)} 

for all functions f : Z — > IR for which the expectations in (6) exist. 

Conversely, if a(k) and b(k) satisfy (5) and a(k) / for all k G I then 
X has distribution C(Y) whenever X has support I and satisfies (6) for all 
functions f{x) = l(x = k), k € I. 

When Y has support No then k £ Z in (5) may be replaced by k £ No, 
while if Y has support U n = {0, 1, . . . , n} for some n £ No, then (5) may be 
replaced by the condition that (5) holds for k £ U n and that a(n + 1) = 0. 

Proof First suppose that (5) holds and that C(X) = C(Y). Then for all 

k £ Z, 

E(a(X + 1)1(X + 1 = k)) = a{k)P{X = k - 1) 

= a(fc)r fc _i 

= b(k)rk 

= b{k)P(X = k) 

= E(b(X)l(X = k)). 

Hence (6) holds for fix) = l(x = k),k £ Z. By linearity, (6) holds for 
all functions with finite support, and hence for all the claimed functions by 
dominated convergence. 
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Conversely, if (6) holds for X with f(x) = l(x = k) for k £ I then 

a{k)P(X = k - 1) = b(k)P(X = k). 

Hence, using that a(/c) / 0, rj, / for G I and that X has the same 
support as Y yields 

P(X = k - 1) _ b(k) _ r fc _x 
P(X = k) a{k) r k ' 

If 1= {s,...,t}, then for j el 

P(X=j) = A P(X = fc-l) = r, 

Summing over j £ I yields P(X = t) = r t , and hence P(X = j) = rj, 
showing C{X) = C(Y). One may argue similarly for the remaining cases 
where I is an unbounded integer interval. 

Lastly, when the support of Y is a subset of No then (5) holds trivially 
for k No, and when Y has support U n = {0, 1, . . . , n} then (5) also holds 
trivially for k > n + 2, and at k = n + 1 when a(n + 1) = 0. □ 

For example, when Y ~ V(X), the Poisson distribution with parameter 
A, then = e~ x \ k /kl, and we obtain 

— = v for all fc € N . 

rk A 

Setting = k and a(/c) = A yields the standard characterization of the 
Poisson distribution [2], 

E[Xf(Y + l)]=E[Yf(Y)]. 

Of particular interest here is the characterization (6) of Lemma 2.1 for 
limiting distributions Q q with distribution P(Q = k) = having support 
No- In this case, when applying Lemma 2.1 we take a(k) > for all k € No, 
whence 6(0) = by (5), and let the values of a(k) and b(k) for k £ No 
be arbitrary. For such functions a(k) and b(k) we consider solutions / to 
recursive 'Stein equations' of the form 

(7) a(k + l)/(fc + 1) - b(k)f(k) = h(k) - Q q h for k G No, 

where Q q h = Eh(Q). 

Solving (7) for f(k),k € No when the functions a(k),b(k) satisfy only 
b(0) = and a(k) > one may take /(0) = arbitrarily, and easily verify 
that the remaining values are uniquely determined and given by 

(8) f(k + 1) = £ ( lliZ^i ) TO - Qih} for k € N . 

j=0 \lll=j+l a \ l ) J 
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In the case where the distribution {p k , fc £ No} with support No satisfies (5) 
with pk replacing r k , the solution (8) simplifies to 

k 

_ EKkjQ) ' Q ,k W Q<k)] fa 
a(k + l)p k 

In particular, for = l(k e A) with A C No and C4 = {0,1, ... ,k}, 

as in Barbour et al. [2], Lemma 1.1.1, for k € No the solution /a satisfies 

, Q , P(QeAnU k )P(QeU c k )-P(QeAnU c k )P(QeU k ) 

a(fc + 

implying 

with equality when A = U k . 

Lemma 2.2. Zei Q /iaue distribution {p k ,k £ No} u>ii/i p& > for all 
k G No, and Zei a(k),b(k) satisfy (5) with p k replacing r k , and for A C No 
let $a be the solution to (7) given by (9). Then 

\um< p{Q> - l) 



a(l) 

Proof. Prom (9) with k = we obtain 

, n . P(gein tf )P(Q > l) - P(Q e An U§)P(Q = 0) 

a(l)p 

If A 5 then 

P(Q = 0)P(Q > 1) - P(Q € A \ {0})P(Q = 0) 



I/a(i)| 



a(l)p 

P(Q>i)-P(Qe A\{0}) < P(Q > l) 
a(l) ' " a(l) : 

while if A ^ then again 

^ = P(Q £ A)P(Q = 0) ^ P(Q > 1) 



a(l)p " a(l) 

Lemma 2.3 collects some bounds that will be useful. 
Lemma 2.3. For q > 2 one has 



□ 



11(1 - lAf)>l-l/a-l/o 2 , 



i=i 
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H(l - l/q') > 1 - 1/q - l/q 2 + l/q 5 + 1/q 7 - 1/q 12 - 1/q 15 , 

i>l 

H (1 - l/q 1 ) > 1 - 1/q - 1/q 3 and ]J (1 - 1/q 1 ) > 1 - 2/q 3 . 



i>l i>3 
i odd i odd 



Proof. The first claim is Lemma 3.5 of [24], and arguing as there yields the 
second claim. Thus 

rwi - 1/?') 



n>-w> > ^t_ Vq2 

i>i 1 ^ 

i odd 

1 - 1/q - 1/q 2 + l/q 5 + l/q 7 - 1/q 12 - l/q 15 



~ 1 - l/q 2 

> i_ l/ q _i/ ? 3 > 

where the last inequality holds since 

(1 - l/q - l/q 2 + 1/q 5 + l/q 7 - l/q 12 - l/q 15 ) 

- (1 - l/s 2 )(l - 1/q - 1/q 3 ) = q8 ~f 5 ~\ 

which is positive for q > 2. The final claim now follows by applying the 
third to obtain 



i odd 

and using that q > 2. □ 
We will also apply the inequality 

n n 

(11) U(i- ai )>i-j2^ 

i=i i=i 

valid for a, G [0, 1], i = 1, . . . , n, and easily shown by induction. 

3. Uniform matrices over finite fields 

In this section we study the rank distribution of matrices chosen uniformly 
from Mat(n, q) and take the distributions Q q and Q qt n as in (2) and (1) 
respectively; throughout this section we take q > 2. The goal of this section 
is to prove Theorem 1.1. The following lemma is our first application of the 
characterizations provided by Lemma 2.1. 

Lemma 3.1. If Q has the Q q distribution then 

(12) E[qf(Q + l)]=E[(qQ-l) 2 f(Q)] 
for all functions f for which these expectations exist. 



(14) J ^ L = _ n ' for all k € U n . 
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If Qn has the Q q , n distribution then 
(13) E [5(1 - g- n+Q ")/(Q n + 1)] = E - l) 2 f(Q n )] 

for all functions f for which these expectations exist. 
Proof. From (2) we obtain 

Ehzl = (l^L forall^No. 
Pk q 

An application of Lemma 2.1 with a(k) = q and b(k) = (q k — l) 2 yields (12). 
Similarly, from (1) we obtain 

Pk-i,n _ (q h - i) 

Pk,n g(l - q~ 

An application of Lemma 2.1 with a(k) = q(l - q~ n+k ~ l ), b{k) = (q k — l) 2 , 
noting a(n + 1) = 0, yields (13). □. 

Here we provide a proof of the following moment computation using the 
characterization (13); an algebraic proof of Lemma 3.2 appears in the ap- 
pendix. 

Lemma 3.2. If Q„ has the Q qt71 distribution on U n = {0, 1, • • ■ ,n} given 
by (1), then 

E(q Qn ) = 2-l/q n . 

Proof. Applying the characterization (13) with the choice f{x) = q kx we 
obtain 

E[q(l - g -«+Q™) g MQn+i)] = E ^ q Q n _ ifgkQn} 
Letting ct = Eq k ® n yields the recursion 

(15) c k+2 = (2 - q- n+k+1 )c k+l + (q k+1 - l)c k . 

Since Q q;n is a probability distribution, c$ = 1. Setting k = —1 in (15) 
yields c\ = 2 — q~ n . □ 

Remark: Letting b k = Eq k ®, one similarly obtains the recursion 

b k+1 = 2b k + {q k - l)6 fc _i 

using (12). As &o = 1 ; applying the recursion with k = yields b± = 2. 
One concludes that b k is the k th Galois number, the number of subspaces 
of a /c-dimensional vector space over ¥ q . A combinatorial proof of this fact 
appears in [16]. 

In the remainder of this section we consider the Stein equation (7), with 

(16) a(k) = q and b(k) = (q k - l) 2 , 

for the target distribution Q q , and for A C No we let /a denote the solution 
(9) when h(k) = l(k € A). 
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The following lemma is crucial, and should be useful for other applications 
of Stein's method to the distribution Q q of (2). For a function / : No — >■ M, 
let 



sup 

fcGNo 



Lemma 3.3. The solution Ja satisfies 



sup \\f A \\ < i + 3- 
AcN 9 9 



Proof. As we may set /a(0) = it suffices to consider /a(^ + 1) for k G No- 
By Lemma 2.2, for all A C N 



P(Q > i) 
q 

1 -po 



i i -s (t -^ 



l l 



where we have applied the inequality rii>i(l ~^/q 1 ) > 1 — — l/<7 2 > from 
Lemma 2.3. 

Now consider the case k > 1. By (10) and (16) we have 



(17) 
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and by neglecting the term P(Q G Uk) in (17) and applying (2) we obtain 



j = l v * ' Z=fc+1 y J-li = lV x 



< 



< 



oo 1 
t,2i x—v 1 



a*- 1 V 

* „p n / n _ i n 2 



ij=k+l\^ q 3 > l=k+l" 

h' 2 — 1 00 

g* 1 x - i 



I I - V x J-l 2 ^ r/' 

I I 2-,j=fc+l qJ ) i=k+1 1 



k 2 -l 



OO j 



I 1 - J Z = fc + 1 q 

k 2 00 ^ 

i 00 1 

l^ 1 q^T)) 1=1 i 
1 °° J 



where for the second inequality we have applied (11). 
Hence, for k > 1, using g > 2, we obtain 



1 00 1 

|/^ + i)| <- r^E^ 



thus completing the proof. □ 
We now present the proof of Theorem 1.1. 



Proof. We first compute the lower bound on the total variation distance by 
estimating the difference of the two distributions at k = 0. In particular, by 
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(3), (1) and (2) 



\Qq,n — Qq\\TV 



> 



bo,n - Po] 



Ki<n 



Ki 



> _ [(1 _ ... (1 - - (1 - 



(1 - l/g^ 1 )] 



1 



> 



> 



2g™+ 1 
1 

2q n + 1 
1 



(1 - 1/g) ••• (1 - I/O 
(1 " 1/9 " 1/9 2 ) 



8g n+1 ' 

The third inequality used Lemma 2.3, and the last that q > 2. 
For the upper bound, 

\P(Q n e A) - P{Q e A)\ = \E[h A (Q n )} - Q q h A \ 

= \E[qf A (Q n + l)-(q^-l) 2 f A (Q n )]\ 
= \E[q- n+ ^ +1 f A (Q n + l)}\ 
< \\f A \\Eq- n+ ^ +1 , 

where we have applied (13) in the third equality. Applying Lemmas 3.3 and 
3.2 gives that 



||^||^-" +Q " +1 <(^ + ^3 



-n+1 



1 

2 



< 



2 1 + 



7 n+l 



< 



,n+l ' 



Now taking the supremum over all A C No and applying definition (3) 
completes the proof. □ 
Remark: The limit distribution Q q also arises in the study of the dimen- 
sion of the fixed space of a random element of GL(n,q). More precisely, 
Rudvalis and Shinoda [25] prove that for k fixed, as n — > oo the probability 
that a random element of GL(n, q) has a k dimensional fixed space tends to 
Pk- See [16] for another proof. 



4. Symmetric matrices over finite fields 

Let S be the set of symmetric matrices with entries in the finite field ¥ q 

(where q is a prime power). Clearly |5| = q^ 2 > . The paper [6] determines 
the rank distribution of a matrix chosen uniformly from S when q is odd, 
and the paper [21] determines this distribution for q both odd and even, 
given by (19). 
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Throughout this section q > 2, and we let Q q be the distribution on No 
with mass function 

n *>i (i - W) 

(18) Pk = , 

nti(^-i) 

and for n € No we let Q q:Tl be the distribution on U n = {0, . . . , n} with mass 
function 

(m\ N(n,n-k) 

(19) p k ,n = t^+Tn where 

q{ 2 ) 

h 2i 1h-\ 

N(n, 2h) = H 2 f_ J] - 1) for 2h < n, and 

i=l ^ ' j=0 

jV(n, 2/t + 1) = J] 2 f_ J]^"' - *) fo^/i+l^n. 

i=i ^ ' i=0 

Theorem 4.1. 7/n is even, we have 
i 

I/n is odd, we have 



.18 2.25 



.18 2 

^2 < ||Q,,n-e,||2V < 

We again begin by using Lemma 2.1 to develop characterizations for the 
two distributions of interest. For n G No we let l n = l(n is even). 

Lemma 4.2. has the Q q distribution then 

E\f(Q + l)]=E[(qQ-l)f(Q)] 

for all functions f for which these expectations exist. 
If Qn has the Qq, n distribution then 

(20) E[(l - l n - Qn q^ n - Q ^)f(Q n + 1)] = E[(qQ» - l)/(Q n )] 
for all functions f for which these expectations exist. 

Proof. By taking ratios in (18) we obtain 

(21) = g * _ i. 

Pk 

Setting a(k) = 1 and b(k) = q k — 1 applying Lemma (2.1) yields the first 
result. 

If n and k are of the same parity then n — k = 2h for some h, and we 
have 

Pk-i,n _ N(n,n-k + l) _ N(n,2h + 1) _ 2h _,_k_, h(zTT 
p Kn ~ N(n,n-k) ~ N(n,2h) ~ Q ~ L ~ q 

In this case we set a(k) = 1 and b(k) = q k — 1. 
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If k and n are of opposite parity, then n — k = 2h + 1 for some h and we 
obtain 

Pfc-i,n = N(n,n-k+l) = N(n,2(h + 1)) _ g 2(fe+1) , n _ 2h _ x _ , 
p Kn N(n,n-k) N(n,2h + 1) q 2 (h+i) _ i W J 

n— fe+l „fe _ 1 

= ^TTi t(^-1) = ^ ITT iOT k€U n . 

qn—k+1 _ ^ v ^ ' ]_ _ g— n+fe— 1 " 

In this case we set a(/c) = 1 — and = q k — 1. 

Writing a(/c) = 1 — l ra _fc+ig _n+fc_1 and 6(/c) = q k — 1 combines both cases. 
Noting that a[n + 1) = an application of Lemma 2.1 completes the proof. 
□ 

Lemma 4.3. // Q n has distribution Q q ^ n then 

El n - Qn qQ" = 1. 
Proof. Setting /(x) = l n - x in (20) yields 

E[(l - ln-Q^-^ln-Q^} = E[(<fl~ - l)ln- Qn \. 

Since l n _Q n l n _Q n _i = 0, we obtain 

E[l n _ Q „_ 1 ] = E[(^«-l)l n _ 0n ], 
and rearranging yields 

E[l n - Qn q Q "} = £?[ln-Q„-l] + £[ln-Q„] = 1, 

as claimed. □ 
In the remainder of this section we consider the Stein equation (7) for the 
target distribution Q q with 

(22) a(k) = 1 and b(k) = q k - 1, 

and for A C No we let /a denote the solution (9) when h(k) = l(k G A). 

Lemma 4.4. The solution /a satisfies 

11 2 

sup |/a(1)| < - + and sup < 

AcNo 9 9 AcN ,fc>2 9 

Proo/. By Lemma 2.2, for all A C N , 

|/a(1)| < P(Q>1) 
= 1 -Po 

- 1- n o-?> 

i>l,i odd 

l l 

^ - + -' 

9 r 

where we applied the third inequality in Lemma 2.3. 
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For k > 1, using (10) and (18), 



\f A {k + l)\ < 



Pk 

k 



W - u E n , ' _ » 

i=l j=fc+illi=lW J-J 



=fe+i 
1 



< 



z£i nU+i^-i) 

oo 

E 

1=1+1 ^ ' ' " "' lli=fe+l 

1 °° 1 

n^+iCl-?"*) g('('+i)-*(*+i))/2 



g(i(i+i)-fc(fc+i))/2 ni =fc+1 (i- g - 



i 



=k+l 

oo 



i 



< 



a= fe+ i(i-^) 



ij=fe+i 

In particular, for all k > 1 we obtain 



1_ ^ 1_ 

1=2 q 

( 1 , 1 ^ 1 \ 

+ g 2fc gi(J + l)/2 j 



i / 00 I \ 

Mt+lil ^o-rt l 1+ S^J- 

and the proof is now completed by using the fact that for all q > 2 

nHrr^^E^s™.^,^. 

The upper bound on the first factor used the second assertion of Lemma 
2.3. Indeed, 

n _j 1- 1/q - 1/q 2 + 1/q 5 + 1/q 7 - 1/q 12 - 1/q 15 

ll [ Q ) ~ 1 — 1/q 

i>2 1 H 

The upper bound on the second factor used that 

OO -J oo 

1=2 y Z=5 

= 1 + 1/2 3 + 1/2 6 + 1/2 10 + 2/2 15 < 1.142. 

□ 

We now present the proof of Theorem 4.1. 
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Proof. For the lower bound one computes from the formula for j>o,n m (19), 
in the case n = 2m is even, that 



P0,n 



N(n,n) 



(1-1 M 1-1/^)... (1 _ W) -Q 

i=l q 



-2i 



= (l-l/gKl-l/^-.-a-l/g"" 1 ). 



Thus the total variation distance between Q q>n and Q Q is at least 



-^\po,n -Po] 

1 I", X w 1 x 

a ip-iKi-^-d- 

1 



> 



> 



2g n + 1 
1 

2< 7 ™+ 1 
.18 



(l-l/g)(l-l/ 9 3 )---(l-l/g 
(1_ 1/g-l/g 3 ) 



n— 1 > 



(i-^+r) 



- „n+l ' 



The second inequality used Lemma 2.3, and the final inequality that q > 2. 
When n = 2m + 1 is odd, we obtain similarly that 



> 



[P0,n ~ Po] 
(1--)(1 



g- 1 q n q q a 



~,n+2 - 



> 



> 



2q n+2 
1 

2<f i + 2 
.18 



(l-l/g)(l-l/ g 3)...(l_l/ g r. 
(1 - 1/q - l/q 3 ) 



- qn+2 ■ 
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To prove the upper bound, for any A C No we have 
\P(Q n G A) - P{Q G A)\ 
= \E[h A {Q n )]-Q q h A \ 
= \E[f A (Q n + l)-(q^-l)f A (Q n )}\ 
= \E[l n _ Qn q-( n -^f A (Q n + l)}\ 

< l n q- n \f A (l)\P(Q n = 0) 

+ \E[ln-Q n q' (n ~ Qn) fA(Qn + 1)1(Q„ > 1)]| 

< l n g- n |/ A (l)| + E[l n „ Qn q-^"h(Q n > 1)] sup \f A (k)\ 

k>2 

< l n q- n \f A (l)\+E[l n _ Qn q-^-^]sup\f A (k)\ 

k>2 

= l n q- n \f A (l)\+q- n sup \f A (k)\ 

k>2 

and the result easily follows. The last two steps used Lemmas 4.3 and 4.4, 
respectively. 

□ 

5. Symmetric matrices over finite fields with zero diagonal 

This section treats the rank distribution (24), (29) of a random symmetric 
matrix with zero diagonal over a finite field ¥ q , when q is a power of 2. 
Such matrices were termed "symplectic" in [22], which studied their rank 
distribution in the context of coding theory. We remark that by [7] and 
elementary manipulations, the quantity N(n, 2h) defined in (24) below is 
also equal to the number of n x n skew-symmetric matrices of rank 2h 
(where now q is odd), so our results also apply in that context. We also 
mention that the two limiting distributions studied in this section arise in 
the work of the number theorist Swinnerton-Dyer on 2-Selmer groups [29]. 
We consider the cases where n is even and odd separately. 

5.1. Case of n even. Throughout this subsection, let n = 2m, an even, 
non-negative integer, and with q > 2, let Q q be the distribution on No with 
mass function 

2k 

(2 3 ) » = nc- w> * . 

i>l,odd lli=lW L ) 

For n G No let Q q ^ n be the distribution on U m = {0, . . . , m} with mass 
function 

ATI 9£A ^ 2i—2 2ft— 1 

(24) Pk ,n = ] where N(n, 2h) = ]J \— ]J (q^ - 1). 

g Uj i=i q _1 i=o 
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Theorem 5.1. We have that 

.18 1.5 

^+1 < \\Qq,n ~ Qq\\TV < 

We begin the proof of Theorem 5.1 by developing characterizations of the 
two distributions of interest. 

Lemma 5.2. If Q has the Q q distribution then 

E[q 2 f(Q + 1)] = EKq 2 ®- 1 - l)(q 2 Q - 1)/(Q)] 

for all functions f for which these expectations exist. 
U Qn has the Qq t n distribution then 

(25) E[(q 2 - q- 2 ^-^-^)f(Q n + 1)] = E[{q 2Qn ~ l - l)(q 2 ^ - l)f(Q n )\ 

for all functions f for which these expectations exist. 

Proof. By taking ratios in (23) we obtain that for k £ No 

(2g) Pk-l = (g^-l _ l )(g 2fe _ 1} 

Pk q 2 

Setting a(k) = q 2 and b(k) = (q 2k ~ 1 — l)(q 2k — 1), applying Lemma 2.1 yields 
the first result. 

Similarly, the second claim can be shown using Lemma 2.1 and (24) to 
yield 

Pfc-i,n = N(2m,2{m-k + l)) = {q 2 ^ 1 - l)(q 2k - 1) 
p k)H N(2m, 2(m — k)) q 2 - g -2(m-fc) ' 

upon setting a(k) = q 2 — q- 2 ( m ~ k ) and b(k) = (q 2k ~ l — l)(q 2k — 1), noting 
that a(m + 1) = 0. □ 

Lemma 5.3. If Q n ~ Qq,n then 

Eq 2Qn =q + l-q- n+1 . 
Proof. For k any integer, letting f(x) = q kx in (25) yields 

E[(q 2 - g- 2 ( m -Q™- 1 ))g fe ( ( 9™+ 1 )] = E[(q 2Qn ~ l - l)(q 2Qn - l)q kQn }. 
Setting Cfc = Eq kQn , this identity yields 

q' 1 ^ - (1 + q- 1 - q~ 2m+2+k )c k+2 + (1 - q k+2 )c k = 0. 
Substituting k = —2 and using that cq = 1 we obtain 
g- 1 c 2 -(l + g" 1 -g- 2m ) = 0, 

so that 

C2 = q (l + q- 1 - q - 2m ) =q+l- q~ 2m+1 . 

□ 

In the remainder of this subsection we consider the Stein equation (7) for 
the target distribution Q q with 

(27) a{k)=q 2 and b(k) = {q 2k ~ l - \){q 2k - 1), 
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and for A C No we let Ja denote the solution (9) when h(k) = l(k € 

Lemma 5.4. The function f a satisfies 

11 1 31 

sup |/a(1)| < + and sup \fA(k)\ < - L =-. 

AcN I 6 <f AcN ,fc>2 q 7 

Proof. By Lemma 2.2, for all A C N , 

P(Q > 1) 



I/a(i)I < 



9 2 



1 - 




g 2 




(;- 


q 2 1 


? ( 




1 1 

^3+^5' 



where the second inequality used Lemma 2.3. 
For k > 1, by (10) and (23), 



l/A(fc+l)| < 



<? 2 Pfc 



« 2fc+2 ^n-ii^-i) 

J_ - g 2(i-fc) 



1 oo 



J-k 



< 



■ , ,^^ 2 - fe2) n 2 W(i-^) 
i 00 </~ fc 



^ 00 

fc+T + 



< 



<? 2 ns 2fc+ i(i-^) 

Hence for all > 1 we obtain 



( 1 , Lf M 



1 / 1 00 1 \ 

l,#ti|l V[E,i.-,-) l 1+ ?SH' 
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and the proof is now completed using the fact that for all q > 2 
1 / 1 °° 1 \ 

nsi^y ( 1+ ?g^J - <i-™^) < i.3i. 

The upper bound on the first factor used part 2 of Lemma 2.3. The upper 
bound on the second factor used that 



1- 



1 oo 1 
^3 X] n 2l 2 -l 



< 1 + 



oo ^ 

+ 9T21 



1=2 



2 3 I 2 6 

2 12+l 

\ 1=3 



1+: 



1 / 1 



2 \ 



2 e + 2 i5 y 



< 1.002. 



□ 



We now present the proof of Theorem 5.1. 
Proof. From the formula for po, n , one has that 



P0,r 



N(n, n) 



= (l-l/g)(l-l/g 3 )---(l-l/g- 1 ). 



The argument in the proof of Theorem 4.1 now shows that total variation 
distance between Q q ^ n and Q q is at least .18/q n+1 . 

For the upper bound, arguing as in the proof of Theorem 4.1 we obtain 

\P{Q n eA)-P(Q €A)\ 
= \E[h A {Q n )]-Q q h A \ 

= \E[q 2 f A (Q n + 1) - {q 2 ^ 1 - l){q 2 ^ - l)f A (Qn)}\ 
= \E[q- 2{m -^f A {Q n + l)]\ 

\P{Qn = 0) 

- 1} f A (Qn + l)l(Qn > 1)]| 

+ E[q-* m -Q»- 1 h(Q n >l)] S *p\f A (k)\ 

k>2 

-2(m-Q„-l)l 



< 



7 -2(m-l) 

+ |£[g~ 2(m ~ " 



< q 



-2(m-l) 



I/a(1) 



< 



< 



< 



-2(m-l)| /A(1) 
-2(m-l)| /A(1) 
-2(m-l)| /A (l) 



+ E[q-« m -V»-V] Sa p\f A (k)\ 

k>2 



+ q 



+ q 



-2(m-l) 



-2(m-l) 



(9+1 



sup I /a (fc) I 



fc>2 



(9 + 1) sup 



-n+2 



l + l)+1.31(g-« ; 



k>2 



+ q- n+2 )^ 



-(n+l) 



-(n+3) 



+ 1.31? 



-(n+4) 



+ 1.31g 



-(n+5) 



< 1.5g" 



-(n+l) 



as claimed. Note that Lemma 5.3 was used in the fourth equality, and 
Lemma 5.4 in the second to last inequality. 

□ 



STEIN'S METHOD AND THE RANK DISTRIBUTION 19 

5.2. Case of n odd. Throughout this subsection let n = 2m + 1, a positive, 
odd integer, and with q > 2, let Q q be the distribution on No with mass 
function 

„2fc+l 

(28) » = lid" W) ' ■ 

i>l,odd lU=l W ^ 

For n G No let Q q ^ n be the distribution on {0, ... , m} with mass function 

N(n, n - 1 - 2k) 



(29) p fc , n 



g(5) 



where N(n,2h) is given in (24). 

Our main result is the following theorem. 

Theorem 5.5. We /lave i/iai 

.37 2.2 

^2 < ||Q,,n-Q,||3V < 

We again begin by developing characterizing equations for the distribu- 
tions under study. 

Lemma 5.6. If Q has the Q q distribution then 

E[q 2 f(Q + 1)] = e[(q 2Q+1 - m 2Q - i)/(Q)] 

for all functions f for which these expectations exist. 
If Qn has the Q q , n distribution then 

(30) E[(q 2 - q' 2 ^' Q ^)f(Q n + 1)] = E[(q 2 ^ 1 - l)(q 2 ^ - l)f(Q n )\ 

for all functions f for which these expectations exist. 

Proof. By taking ratios in (28) we obtain 

Pfc-i (q 2h+1 -l)(q 2k -l) 



(31) 



Pk q 2 



Setting a(k) = q 2 and b(k) = (q 2k+l — l)(q 2k — 1), Lemma 2.1 yields the first 
claim. Similarly, the second can be shown by applying (29) to yield 

p fc _i, n N(n,2(m-k + l)) {q 2k+1 - l)(q 2k - 1) 



p Kn N(n, 2(m - k)) q 2 _ q -2( m -k) ' 

and then invoking Lemma 2.1 with a(k) = q 2 — g- 2 ( m ~ fc ) an d b(k) = (g 2fe+1 — 
l)("? 2fe ~~ 1); noting a(m + 1) = 0. □ 



Lemma 5.7. If Q n ~ Qq,n then 



Eq 2Q n = i + q -i 



q 



—n 
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Proof. For k any integer, letting f(x) = q kx in (30) yields 

E[{q 2 - ^-2(m-Q n -i)^fc(Q n +i)j = E[(q 2Qn+1 - l)(q 2Qn - l)q kQn }. 

Setting Cfc = E[q kQn ], this identity yields 

qc k+4 - (1 + q - q- 2m+2+k )c k+2 + (1 - q k+2 )c k = 0. 
Substituting k = —2 and using that Co = 1 we obtain 

qC2 _ (1 + q _ g -2m) = 0, 

so that 

C2 = g -l(l + g - g- 2m ) = 1 + q' 1 - q- 2m -\ 

□ 

In the remainder of this subsection we consider the Stein equation (7) for 
the target distribution Q q with 

(32) a{k)=q 2 and b(k) = {q 2k+1 - l){q 2k - 1), 

and for A C No we let /a denote the solution (9) when h(k) = l(k G A). 

Lemma 5.8. The function f a satisfies 

2 1.14 

sup |/a(1)| < -= and sup |/a(£0I < — q~- 

AcNo Q AcN ,fc>2 Q 

Proof. By Lemma 2.2, for all A C N , 

P(Q > 1) 



I/a(i)| < 



q 2 



1 -Po 




where the second inequality used Lemma 2.3. 
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For k > 1, by (10) and (28), 



l/A(fe + l)| < 



q 2 Pk 



i_ ~ ^C-fc) 

« 2 ^-^ 



< 



1 oo 1 

? «£i 9 2(,2 - fc2)+( "»nS +2 (i-?i 

1 oo 1 



< 



<? 2 a= 2fc+2 (i-^) 

Hence for all k > 1 we obtain 



1_ ~ 1_ 

Z= 2 y 

( 1 1 1 \ 



9 2 a= 2fc+2 (i-^) ^ 4fe+3 s^ 2+ ™ 
i 



i / i 00 1 \ 

and the proof is now completed by using the fact that for all q > 2, 

The inequality flSUU ~~ <7 _l )~ 1 < 1-137 is obtained by applying part 2 of 
Lemma 2.3. We also used that 



1^ 1 ^_1 

+ q Z> 2P+i - + 2 I 2 10 + ^ 2 18 

H 1=2 H \ 1=3 



□ 

We now present the proof of Theorem 5.5 
Proof. From the formula (29) for p ,n we obtain 

R,,n = N{n '?~ 1} = (1 - 1/5)(1 - l/^ 3 ) ■••(!- W)-^r- 
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Thus, now applying (28), the total variation distance between Q 9i „ and Q q 
is at least 



> 



> 
> 



;bo,n -Po] 



2(g-l) 



(l-I)(l-l)...(l--L)-(l-i)(l-l)...(l 



2(9 - l)q n+2 



q> q' 
(l-l/9)(l-l/9 3 ) 



2q n+2 
1 

2q n+2 
.37 

„n+2 • 



(1 _ 1/gS) ...(!_ 

(1 - 2/g 3 ) 



9 9" 

(1 - I/O 



7 n+2 - 



The second inequality used the last claim of Lemma 2.3. 

Arguing as for the proof of Theorem 4.1, for any A C No we have 

\P(Q n e A) - P(Q e A)\ 
= \E[h A (Q n )} - Q q h A \ 

= \E[q 2 f A (Q n + 1) - (q 2 ^ +1 - l){q 2 ^ - l)f A (Q n ))\ 
= \E[q- 2 ^-^-^f A {Q n + l)]\ 
< q- 2 ^\f A (l)\P(Q n = 0) 

■ 1} /A(Qn + l)l(Qn > 1)]| 
+ E [q-^ m -^- 1 h(Q n > l)]sup\f A (k)\ 

k>2 

-2(m-Q„-l)i 



+ \E[q 



-2(m—Q„- 



< q 

< a 



2(m- 

2(—D| /a(1) 

= 9- 2(m - 1} |/A(l) 
< q- 2(m - l) \fA(l) 



+ E[q-« m -^-'>]sup\f A (k)\ 

k>2 

+ q- 2 <- m - 1 \l+q- 1 - q- n )suj>\f A (k)\ 

k>2 

+ q - 2 (™-V (1 +q-i) sup \f A (k)\ 
k>2 



< 



„ n+3 + L14 ^-n+3 



-n+2^ 



1 



= 2q- {n+ V + 1.14^'( n+6 ) + 1.14g-( n+7 ) 
< 2.2g^ n+2 ) 

as claimed, where we have applied Lemmas 5.7 and 5.8 in the second to last 
equality, and inequality, respectively. □ 
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6. Skew centrosymmetric matrices over finite fields 

An n x n matrix A is called skew centrosymmetric if Aij = —Aji and 
Aij = A n+ i_j in+ i_j. This section studies the rank distributions (33) and 
(34) of a randomly chosen skew centrosymmetric matrix with entries in ¥ q 
for q odd. 

Suppose that n = 2m is even. Waterhouse [32] shows that the total 

2 

number of skew centrosymmetric matrices is q m , that all such matrices 
have even rank, and that the proportion of n x n skew centrosymmetric 
matrices of rank n — 2k is equal to 

N(n,n-2k) 
(33) p ktn = -5 , 



q" 



m—h—l ™ a h—l 

q — q J 



where N(n,2h)= J] _ j H^-j)- 



„„„-,. _ „j 

j=0 H H i=0 

We claim that pfc jTt in (33) is exactly equal to the probability that a uniformly 
chosen n/2 x n/2 random matrix with entries from ¥ q has rank n/2 — k. 
Indeed, pulling out factors of q, one can write (33) as 

q j=0 \ H / j=fe+l 

Comparing this expression with (1) with n replaced by n/2 shows that it is 
sufficient to prove that 

W ( i = n;f fc+1 q-^) 
Mv l -*- k ) nj£ 2 r*(i - 5-0 ' 

This identity holds since both 

k-1 n/2-k k-l n/2 

H(l -q^) l[(l- q -i) ari d I^W"*) II ^ ~ 
3=0 j=l j=0 j=k+l 

are equal to nl=i(l ~~ V? ? )- Hence the following Corollary is immediate 
from Theorem 1.1. 

Corollary 6.1. For q > 2, let Q q be the distribution (2) on No and for 

n = 2m in No let Q qt n be the distribution on {0, ... , m} with mass function 
(33). Then 

1 3 

Now suppose that n = 2m+l is odd. Waterhouse [32] shows that the total 
number of skew centrosymmetric matrices is q m +m , that all such matrices 
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have even rank, and that the number ofnxn skew centrosymmetric matrices 
of rank 2h is equal to 

m—h m+1 j h—1 

^^)=n ^i- h "_v n(g ro -g < )- 

j=0 H y i=0 

Hence, 

, ^ N(n,n- 2k - 1) 

(34) Pk , n = V ' gm2+m ^, fcGC/ m = {0,l,...,m} 

is the proportion of skew centrosymmetric matrices of rank n — 2k — 1. The 
main result in this section is Theorem 6.2, which provides bounds on the 
total variation distance between Q ?;n , the distribution given in (34), and 
Qq, given by 

(35) »fc = — = ; , k <G No- 

v^+i) nf =1 a- w) 2 

The main result of this section is the following theorem. 

Theorem 6.2. For n > 1 odd, and q > 2, we have that 
1 3 

4g (n+3)/2 - H S 9,n " Qq\\TV < q{n+3)/2 • 

We begin with the following characterization lemma. 
Lemma 6.3. // Q has the Q q distribution, then 

(36) E [qf(Q + 1)]=E [(q Q - l)(q Q+1 - l)f(Q)] 

for all functions f for which these expectations exist. 

If Qn has the Q q „ distribution then 
(37) 

E [(q - 9 0n+i-(n-i)/2^ f{Qn + 1} j = E [ (? 0„ _ 1)((? Q„+i _ i )/(Qn) ] 

/or a// functions f for which these expectations exist. 
Proof. For the first assertion, one calculates that 

a = , = tf-i)(^-i) fc€No 

Taking a(fc) = <? and 6(fe) = - l)((/ fc+1 - 1) in Lemma 2.1 the first 
assertion follows. 

For the second assertion, one calculates that 

Pk-i,n = N{n,n-2k + l) = (q k - l)(q k+1 - 1) 
p Kn N(n,n-2k-l) g _ g fc-(n-i)/2 ' 

Taking a(k) = q - M™-i)/2 an d b(k) = (q k - l)(q k+1 - 1), noting that 
aim + 1) = 0, the second assertion follows by Lemma 2.1. □ 

Lemma 6.4 calculates the expected value of q® n . 
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Lemma 6.4. // Q n ~ Qq,n then 

Proof. Let cu = E{q k ® n ], and set f(x) = q kx in (37). Elementary manipula- 
tions yield the recurrence 

qc k+2 = (q + 1 - q k+1 ~ {n - 1)/2 )c k+1 + (q k+1 - l)c k . 
The result now follows by setting k = — 1 and using that Co = 1. □ 

In the remainder of this section we consider the Stein equation (7) for the 
target distribution Q q with 

(38) a(k) = q and b(k) = (q k - l)(q k+1 - 1), 

and for A C No we let /a denote the solution (9) when h(k) = l(k G A). 
Lemma 6.5. The function f a satisfies 

2 

sup \f A (k)\ < -3. 

AcN q 

Proof. By Lemma 2.2 and (35), 

P(Q > 1) 



I/a(i)| < 



< 



q 

1 -po 
q 

1 - M - W) 

i-(i-Ei> 2 Vg' 
1 



? 3 (l-l/g) 



where we have applied (11) in the second inequality, and used that q > 2. 
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For k > 1, by (10), 
\fA(k+l)\ 



< 



< 



P(Q € U c k) 

qpk 

k 

q k2+k -\l - l/q k+1 ) Y\(l - X/tff 
i=l 



■ E 

l=k+l 



9 !2+i (i-w +1 )nU(i-W) 2 



oo 

k 2 +k-l 



oo 

< 9 * a +*- 1 £ 



1 



< ? v 



oo ^ 

^fc 2 +fe-l 



E 



fl _ 1 )2 (k+l) 2 +k+l 

■ oo , 

4v^ 1 



^ a l 2 +l+2kl 
H 1=1 H 

±v_L 

2 

where (11) was applied in the fourth inequality. □ 

We now present the proof of Theorem 6.2. 
Proof. From the formula (34) for po,n one computes that 
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Thus, using (35), the total variation distance between Q q , n and Q q is at 
least 

7jbo,n - Po] 

> I[(l-l/ g 2 )(1 _ 1/g 3 ) ... (1 _ 1/g (n + l)/2 )] 

-i[(l-l/g 2 )(l-l/g 3 ).-.(l-l/^+ 3 )/ 2 )] 
= (1 " 1/q2){1 ~ 1/q3) " ' (1 " V^ (n+1)/2 )- 

By part 1 of Lemma 2.3, 

1/^3)... (1_ l/ g («+l)/2) 
(l-l/g)(l-l/ 9 2 )---(l-l/9 (n+1)/2 ) 



> 



(1 " 1/9) 

1 - l/q - 1/q 2 



1-1/q 
> 1/2. 

It follows that the total variation distance between Q q ^ n and Q q is at least 
l/(4g("+ 3 )/ 2 ). 

For the upper bound, arguing as in Theorem 1.1, 

\P(Q n € A) - P(Q € A)\ 
= \E[h A {Q n )]-Q q h A \ 

= \E[qf A (Qn + 1) " (q Qn ~ l)(q Qn+1 - l)f A (Qn)}\ 
= \E[q^ +1 -^/ 2 f A {Q n + l)]\ 

< ||/Ap[g°" +1 - (n - 1)/2 ]. 
By Lemmas 6.4 and 6.5, this quantity is at most 

2 ff l-(n-l)/2 (1 + 1/q) < ' 



q (n+3)/2 ■ 

□ 



7. HERMITIAN MATRICES OVER FINITE FIELDS 

Let q be odd. Suppose that 9 £ ¥ q2 ,9 2 £ ¥ q , but 9 £ ¥ q . Then any 
a £ ¥ q 2 can be written a = a + b9 with a,b £ ¥ q . By the conjugate of 
a we mean a = a — b9. If A = (aij) is a square matrix, a^j € ¥ q 2, let 
A* = A' = (ocij)', where the prime denotes transpose. Then A is said to be 
Hermitian if and only if A* = A. 
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By [8], for q odd the total number ofnxn Hermitian matrices over ¥ q is 

2 

q n , and the total number of such matrices with rank r is 

r 2n-2(r-i) _ i 

N(n,r)= q QH q . 

i=l 

Hence, the proportion of such matrices with rank n — k is given by 
/ n N(n, n — k) 

(39) Vk , n = ^^ L, keU n = {0,...,n}. 

In this section we compute total variation bounds between the distribution 
(39), denoted Q ? , n , and the distribution 



( 4 °) ^ = II TTT 



which we denote here by Q q . 

Remark: The distribution (40) also arises as a limiting law in the study 
of the dimension of the fixed space of a random element of the finite unitary 
group U(n,q). More precisely, the paper [25] proves that for k fixed, the 
chance that a uniformly chosen random element of U(n, q) has a k dimen- 
sional fixed space tends to p^ as n — > oo. See [16] for another proof. 

The main theorem of this section is the following result. 

Theorem 7.1. For all n > 1 and q > 2 we have 

.07 2.3 

< ||G,,n-G«||iv < ^t- 

The following lemma characterizes the two distributions of interest in this 
section. 

Lemma 7.2. If Q has the Q q distribution then 

(41) E[qf(Q + l)]=E[(q^-l)f(Q)] 

for all functions f for which these expectations exist. 
If Qn has the Q q ^ n distribution then 

(42) E [{q - (_l)«-Q^-»+i) /(Q n + 1)]=E [(q 2Q " - 1)/(Q„)] 
/or a// functions f for which these expectations exist. 

Proof. For the first assertion, one calculates from (40) that 

= q 2k - 1 ^ for ^ fc ^ N ^ 

Taking a(/c) = g 1 and b(k) = q 2k — 1 in Lemma 2.1, the first assertion follows. 
For the second assertion, one calculates that 

Pk-i,n = N(n,n-k + l) = g 2k - 1 

Pk,n ~ N(n,n-k) ~ q-(-\)n-k+l q k-n 
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Taking a(k) = q — ^—\Y~ k+1 q k ~ n anc i 5(fc) = q 2k — 1 in Lemma 2.1, and 
noting a{n + 1) = 0, the second assertion follows. □ 

Next we handle the moment E[q® n ]. Unlike all our other moment com- 
putations where we obtain equality, here we derive an upper bound. 

Lemma 7.3. // Q n has the Q q ^ n distribution, then 

E(q Q ") <2 + q~ n . 

Proof. Setting f(x) = q~ x in (42) implies that 

E [q~ Qn - (-l) n - Q "q- n ] = E[q Q " - q- Qn }. 

Thus 

E[q Qn ] = E [2q- Qn - {-l) n ~ Qn q~ n ] < 2 + q' n . 

□ 

In the remainder of this section we consider the Stein equation (7) for the 
target distribution Q q with 

(43) a{k) = q and b(k) = q 2k - 1, 

and for A C No we let /a denote the solution (9) when h(k) = l(k € A). 
Our next task is to provide a bound on /a- In the following we will apply 
the identity 

(44) na-w) na+wnnrrrW 

i odd i even i odd 

which holds since 

Lemma 7.4. The function f'A satisfies 

11 18 

sup |/a(1)| < -o- and sup < -r- for all q>2. 

AcN 1 AcN ,k>2 Q 

Proof. By Lemma 2.2, 

l /A( i)l<f«M = lzis. 

q q 

By (40), (44) and the third claim of Lemma 2.3, 

po = n a - i/q i ) n a + w) 

i odd i even 

> (l-l/q-l/q 3 )(l + l/q 2 ) 
Thus 1-po < l/^ + l/^ 5 < and hence |/ A (1)| < 1.1/g 2 , for all g > 2. 
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For k > 1, by (10), 
\fA(k + l)\ < 



P(Q > fc + l) 
qpk 

k oo ^ 

oo r 



q k 2 -i °° 

(45) ~ nr^a-v^)^^' 

Since > 1, using (11) we have that 



(46) nr= fe+ i(i - v^) - nr= 2 (i - v^) 



< ==^i TtT = d q where d g 



i _ V°° «-2i q 1 l 



Thus, from (45), 

oo 1 

\f A (k + l)\ < dq-q" 2 - 1 — 

l=k+l q 
oo -. 

i=o y 
q k 2 -i 00 x 



dq 



„2k+2 
y Z=0 



j oo ^ 

(47) < » where = £ -1 

using A; > 1 in the final inequality. Now using that d q and s g are decreasing 
for q > 2, and that 

oo oo 

Z=0 H 1=2 

we obtain the second claim of the lemma. □ 
Now we present the proof of the main result of this section, Theorem 7.1. 
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Proof. We first compute a lower bound for the case where n is odd. From 
(39) we have 



p , n = (1 - l/g)(l - l/q") •••(!- l/q n ){l + V^)(l + 1A/ 4 ) ■■■(! + l/g"" 1 ). 



By (40) and (44), 

P0 >(l-l/<?)(l-l/g 3 )---(l-l/g" +2 ) 

x(l + l/g 2 )(l + l/g 4 )..-(l + V<f +1 ) 

= (l + l/5 n+1 )(l-l/9 n+2 )po, 



Thus 



P0~P0,n > (l-l/(?)(l-l/g 3 )---(l-l/g ri ) 

x(l + l/g 2 )(l + l/g 4 )...(l + l/g"- 1 ) 
X[(l + l/q n+1 ){l - l/q n+2 ) - 1] 
> (1-1/gKl-l/g 3 )--- (l-l/g") 
x[(l + l/ ( f +1 )(l-l/g" +2 )-l] 

■(l-l/g-l/g 3 ^ 1 



(48) 



> (1 - l/g)(l - 1/qr*) ■ ■ ■ (1 - l/g n ) 

> (1 - l/q - l/q 3 ) 2 /q n+1 

> .U/q n+1 , 



7 n+l 



where the fourth inequality used the third claim of Lemma 2.3. Thus the to- 
tal variation distance between Q q>n and Q q is at least |[po — Po,n] > -07/q n+1 . 
Now we compute a lower bound for n even. From (39), 



P0 ,n = (l - l/g)(l - W) ■■■(!- l/g n - 1 )(l + V^)(l + W) ■■■(! + l/<f ), 



and by (40) and (44), 



Po<(i-i/g)(i-iA/ 3 )---(i- W t+1 ) 

x(l + l/ (7 2 )(l + l/5 4 )---(l + l/(Z n+2 ) 

= P0 ,n(l-l/g n+1 )(l + W 4+2 ). 
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Thus 

Po ,n-Po > (i-iAz)(i-iAz 3 )---(i- W 1-1 ) 

x(l + l/g 2 )(l + l/ (Z 4 )...(l + l/g n ) 
x[l-(l-l/^ +1 )(l + l/g"+ 2 )] 
> (l-l/q)(l-l/<?)...(l-l/ q n-l) 
x[l-(l-l/^ +1 )(l + l/g"+ 2 )] 

* ^na-w) 

j odd 

(l-l/g)(l-l/g-l/g3) 



(49) > 



g n+l 



> .18/ q r 

where the fourth inequality used the third claim of Lemma 2.3. Thus the to- 
tal variation distance between Q q ^ n and Q q is at least ^\po,n— Po] > .09/q n+1 . 
For the upper bound, arguing as in the proof of Theorem 4.1, 

\P(Q n € A) - P(Q € A)\ 
= \E[h A {Q n )]-Q q h A \ 

= \E[qf A {Qn + l)-{q 2Qn -l)fA{Qn)]\ 

= \E[(-l) n - Qn q- n+Qn+1 f A (Q n + 1)]| 

< q- n+1 \f A (l)\P(Q n = 0) + E[q- n+ ^ +1 \f A (Q n + l)|l(Q n > 1)] 

< q- n+1 \f A {l)\ +q- n+1 E[q^]sup\f A (k)\ 

k>2 

(50) < q-(' n+1) (l.l + 3.6g- 2 + 1.8g- 3 ) 

< 2.3/g n+1 , 

for n > 1. The third inequality used Lemmas 7.3 and 7.4. □ 

Remark 7.5. The distribution pk )n of (39) holds for q > 3. Over this range 
the bounds of Theorem 7.1 may be slightly improved by applying (46) and 
(47) to replace 1.8 in Lemma 7.4 by I.4, and then using this value in (50). 
One may similarly improve the lower bound by replacing .14 by .38 in (48), 
and .18 by .41 in (49), resulting in 

-^1 < \\Q q ,n-Q q \\TV < ^ for all q> 3. 
8. Appendix 

The purpose of this appendix is to give an algebraic proof of Lemma 
3.2. The proof assumes familiarity with rational canonical forms of matrices 
(that is the theory of Jordan forms over finite fields), and with cycle index 
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generating functions. Background on these topics can be found in [14] or 
[28], or in the survey [15]. 

Proof. (Of Lemma 3.2). 
The sought equation is 

n 

(51) ^Vn = 2-i/g n - 

k=0 

From the expression for p^.n m (1)> it is clear that if one multiplies (51) 
by — 1/q) ••• (1 — l/q n ) where t is sufficiently large as a function of n, 
then both sides become polynomials in q. Since polynomials in q agreeing 
for infinitely many values of q are equal, it is enough to prove the result for 
infinitely many values of q, so we demonstrate it for q a prime power. 

Let M be an n x n matrix over ¥ q . Then n minus the rank of M is equal to 
l(X z (M)), the number of parts in the partition corresponding to the degree 
one polynomial z in the rational canonical form of M. Thus 

(52) tf(«°") = 4r E « liXzim > 

a — 

^ MeMat(n,q) 

where Mat(n, q) denotes the set of n x n matrices over the finite field ¥ q . 
From the cycle index for Mat(n,q) (Lemma 1 of [28]), it follows that 

(53) 

,\X\deg(4>) 



i + v - v = 



, \GL(n,q)\ 

n>l 1 v M£Mat{n,q) 



qK^) u \M 



iie 



Here A ranges over all partitions of all natural numbers, and l(X) is the 
number of parts of A. The quantity cql AX) is a certain function of A, <fi 
which depends on the polynomial <\> only through its degree. The product 
is over all monic, irreducible polynomials <j) over ¥ q other than <j> = z. 
From the cycle index for GL(n,q) (Lemma 1 of [28]), it follows that 

Summarizing, it follows from (53) and (54) that 

The next step is to compute 

q HX) u \X\ _ q lW u \X\ 

2^ r „ T J\\ ~ L^l 



, C GLj2 (A) ^ CGL,z-lW 
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This equality holds because cgl,</>(A) depends on the polynomial (j> only 
through its degree. From the cycle index of GL(n, q), it follows that 

^^\GL(n,q)\ ^ q 

n>l 1 V aeGL(n,q) 
^ g '(A) u |A| 

c GL , z _ l( A) yr ^ y\ X \ d ^) 

J2 11 2- CG ( A ) 

1 2—t\ C GjL , z _!(A) 



^ A CG£,z-l(A) 



1-W ^CGL,z-l(A) 

The third equality used (54) and the final equality is from Lemma 6 of [28] 
and page 19 of [1]. 

Next we can use group theory to find an alternate expression for 

^\GL(n,q)\ ^ q 
n>l 1 v aeGL(n,q) 

Indeed, by the theory of rational canonical forms, g'(A*-i( a )) is the number 
of fixed points of a in its action on the underlying n dimensional vector 
space V. By Burnside's lemma (page 95 of [30]) , the average number of 
fixed points of a finite group acting on a finite set is the number of orbits of 
the action on the set. For GL(n, q) acting on V, there are two such orbits, 
consisting of the zero vector and the set of non-zero vectors. Thus 

^\GL{n,q)\ ^ 1-u 



Comparing the final equations of the previous two paragraphs gives that 
(57) £ 



qiW u W i + u 



It follows from (55) and (57) that 

1 + V — V g '( A * W) - 1 ± u TT I 

^\GL(n,q)\ ^ / 1 - « 1 - 

n>l V ^ n MeMat{n,q) i>l ,H 
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Thus by (52), E(q® n ) is ^ GL ^' q ^ multiplied by the coefficient of u n in 



q 



1 + u tt 1 

i - u 44 1 - 1 



i>l 

From page 19 of [1], the coefficient of u n in 



u/q l 



— n— 

1 - u Al 1 - t 



i>l- 

is equal to [(1 - l/ g )(l - l/q 2 ) ■■■{!- l/g n )] _1 . Thus, 
|GL(n,g)| 



2-1 



+ 



(1 _ ...(!_ l/ q n) (1 _ ...(!_ 1/q n-l ) 



where the last equality used that | GL(n, q) \ = q n (1 — l/q) ■ ■ ■ (1 — l/q n )- □ 

We close this section with two remarks about the distribution Q q n in (1) 
from the introduction. 

• From [3], there is a natural Markov chain on {0, 1, • • • , n} which 
has Qq^ n as its stationary distribution. This chain has transition 
probabilities 

n—i—l(„n—i i\ ( „n _ „n—i\2 

+ 1} = (g n -l)2 'W'*- ^ = V- D 2 

M(i, i) = l- M(i, i - 1) - M(i, i + 1) 

This Markov chain describes how the rank of a matrix evolves by 
adding a uniformly chosen rank one matrix at each step. 

• It is known (page 338 of [30] ) that the number ofnxn matrices over 
the finite field ¥ q with rank r is equal to 



ni 
rig 



r 



k=0 



q nk +( r 2 k ). 



3 



Here 

ni {q n - l){q n - 1 - 1) ■ ■ ■ {q n - m+l - 1) 



Lm 



(q™-l)(q m -i -!)••• (q-1) 



is the g-binomial coefficient. 

Since the proportion ofnxn matrices over ¥ q with rank r is also 
given by p n -r,n of (1), we obtain the following corollary, which we 
have not seen stated in the literature. 
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Corollary 8.1. For < r < n, 



n 



n 

j=n—r+l 



(l-q 



1 



r\ q 



D- 1 : 



r—k 



k=0 



r 
Ik 
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